Identifying Literary Texts with Bigrams
نویسندگان
چکیده
We study perceptions of literariness in a set of contemporary Dutch novels. Experiments with machine learning models show that it is possible to automatically distinguish novels that are seen as highly literary from those that are seen as less literary, using surprisingly simple textual features. The most discriminating features of our classification model indicate that genre might be a confounding factor, but a regression model shows that we can also explain variation between highly literary novels from less literary ones within genre.
منابع مشابه
Identifying Speakers and Listeners of Quoted Speech in Literary Works
We present the first study that evaluates both speaker and listener identification for direct speech in literary texts. Our approach consists of two steps: identification of speakers and listeners near the quotes, and dialogue chain segmentation. Evaluation results show that this approach outperforms a rule-based approach that is stateof-the-art on a corpus of literary texts.
متن کاملDoes the Phonology of L1 Show Up in L2 Texts?
The relative frequencies of character bigrams appear to contain much information for predicting the first language (L1) of the writer of a text in another language (L2). Tsur and Rappoport (2007) interpret this fact as evidence that word choice is dictated by the phonology of L1. In order to test their hypothesis, we design an algorithm to identify the most discriminative words and the correspo...
متن کاملMultifractal analysis of sentence lengths in English literary texts
This paper presents analysis of 30 literary texts written in English by different authors. For each text, there were created time series representing length of sentences in words and analyzed its fractal properties using two methods of multifractal analysis: MFDFA and WTMM. Both methods showed that there are texts which can be considered multifractal in this representation but a majority of tex...
متن کاملAnnotating Characters in Literary Corpora: A Scheme, the CHARLES Tool, and an Annotated Novel
Characters form the focus of various studies of literary works, including social network analysis, archetype induction, and plot comparison. The recent rise in the computational modelling of literary works has produced a proportional rise in the demand for character-annotated literary corpora. However, automatically identifying characters is an open problem and there is low availability of lite...
متن کاملThe Effect of Authentic and Simplified Literary Texts on the Reading Comprehension of Iranian Advanced EFL Learners
The present quasi-experimental study mainly investigates the role of literature as input for reading comprehension in Iranian EFL classrooms. To be more exact, it investigates the effects of authentic and simplified literary texts on the reading comprehension of Iranian advanced EFL learners. The participants were 35 male and female Iranian EFL learners who were at advanced level, studying in a...
متن کامل